Learning to Match Names Across Languages
نویسندگان
چکیده
We report on research on matching names in different scripts across languages. We explore two trainable approaches based on comparing pronunciations. The first, a cross-lingual approach, uses an automatic name-matching program that exploits rules based on phonological comparisons of the two languages carried out by humans. The second, monolingual approach, relies only on automatic comparison of the phonological representations of each pair. Alignments produced by each approach are fed to a machine learning algorithm. Results show that the monolingual approach results in machine-learning based comparison of person-names in English and Chinese at an accuracy of over 97.0 F-measure.
منابع مشابه
A General Path-Based Representationfor Predicting Program Properties
Predicting program properties such as names or expression types has a wide range of applications. It can ease the task of programming, and increase programmer productivity. A major challenge when learning from programs is how to represent programs in a way that facilitates effective learning. We present a general path-based representation for learning from programs. Our representation is purely...
متن کاملLearning of letter names follows similar principles across languages: Evidence from Hebrew.
Letter names play an important role in early literacy. Previous studies of letter name learning have examined the Latin alphabet. The current study tested learners of Hebrew, comparing their patterns of performance and types of errors with those of English learners. We analyzed letter-naming data from 645 Israeli children who had not begun formal reading instruction: a younger group (mean age 5...
متن کاملسیستم شناسایی و طبقه بندی اسامی در متون فارسی
Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...
متن کاملAn Approach for Automatic Matching of Descriptive Addresses
Address matching (also called geocoding) is an applied spatial analysis which is frequently used in everyday life. Almost all desktop and web-based GIS environments are equipped with a module to match the addresses expressed in pre-defined standard formats on the map. It is an essential prerequisite for many of the functionalities provided by location-based services (e.g. car navigation). Sever...
متن کاملVernacular dominance in folk taxonomy: a case study of ethnospecies in medicinal plant trade in Tanzania
BACKGROUND Medicinal plants are traded as products with vernacular names, but these folk taxonomies do not always correspond one-to-one with scientific plant names. These local species entities can be defined as ethnospecies and can match, under-differentiate or over-differentiate as compared to scientific species. Identification of plant species in trade is further complicated by the processed...
متن کامل